home *** CD-ROM | disk | FTP | other *** search
- Posted-By: auto-faq 3.1.1.2
- Archive-name: graphics/fileformats-faq/part4
- Posting-Frequency: monthly
- Last-modified: 01Mar95
-
- This FAQ (Frequently Asked Questions) list contains information on graphics
- file formats, including, raster, vector, metafile, Page Description Language,
- 3D object, animation, and multimedia formats.
-
- This FAQ is divided into four parts, each covering a different area of
- graphics file format information:
-
- Graphics File Formats FAQ: General Graphics Format Questions (Part 1 of 4)
- Graphics File Formats FAQ: Image Conversion and Display Programs (Part 2 of 4)
- Graphics File Formats FAQ: Where to Get File Format Specifications (Part 3 of 4)
- Graphics File Formats FAQ: Tips and Tricks of the Trade (Part 4 of 4)
-
- Please email contributions, corrections, and suggestions about this FAQ to
- jdm@netcom.com. Relevant information posted to newsgroups will not
- automatically make it into this FAQ.
-
- -- James D. Murray (jdm@netcom.com) ;-{)>>>>
-
- ----------------------------------------------------------------------
-
- Subject: 0. Contents of Tips and Tricks of the Trade
-
- Subjects marked with <NEW> are new to this FAQ.
- Subjects marked with <UPD> have been updated since the last release
- of this FAQ.
-
- I. General questions about this FAQ
-
- 0. Maintainer's Comments
- 1. What's new in this latest FAQ release?
-
- II. Programming Tips for Graphics File Formats
-
- 0. What's the best way to read a file header?
- 1. What's this business about endianness? <UPD>
- 2. How can I determine the byte-order of a system at run-time? <NEW>
- 3. How can I identify the format of a graphics file?
-
- III. Kudos and Assertions
-
- 0. Acknowledgments
- 1. About The Author
- 2. Disclaimer
- 3. Copyright Notice
-
- ------------------------------
-
- Subject: I. General questions about this FAQ
-
- ------------------------------
-
- Subject: 0. Maintainer's Comments
-
- Programmer's are code-hungry people. They just want the secrets and they want
- them to work NOW! But always in the back of a hack's mind there are the
- questions: "Is this really the best way to do this? Could it be better?".
-
- This FAQ is to share ideas on the implementation details of reading, writing,
- converting, and displaying graphics file formats. You'll probably get some
- good ideas here, find a few things you didn't know about, and even have a few
- suggestions and improvements of you own to add (send them to jdm@netcom.com).
-
- If you need to know the best way to do something with file formats, or just
- find it embarrassing to implement a chunk of some other programmer's code and
- then have to admit you really don't understand how it works, then this FAQ is
- for you.
-
- ------------------------------
-
- Subject: 1. What's new in this latest FAQ release?
-
- o First release of this new FAQ part!
-
- ------------------------------
-
- Subject: II. Programming Tips for Graphics File Formats
-
- ------------------------------
-
- Subject: 0. What's the best way to read a file header?
-
- You wouldn't think there's a lot of mystery about reading a few bytes from a
- disk file, eh? Programmer's, however, are constantly loosing time because
- they don't consider a few problems that may occur and cause them to loose
- time. Consider the following code:
-
- typedef struct _Header
- {
- BYTE Id;
- WORD Height;
- WORD Width;
- BYTE Colors;
- } HEADER;
-
- HEADER Header;
-
- void ReadHeader(FILE *fp)
- {
- if (fp != (FILE *)NULL)
- fread(&Header, sizeof(HEADER), 1, fp);
- }
-
- Looks good, right? The fread() will read the next sizeof(HEADER) bytes from a
- valid FILE pointer into the Header data structure. So what could go wrong?
-
- The problem often encountered with this method is one of element alignment
- within structures. Compilers may pad structures with "invisible" elements to
- allow each "visible" element to align on a 2- or 4-byte address boundary.
- This is done for efficiency in accessing the element while in memory. Padding
- may also be added to the end of the structure to bring it's total length to
- an even number of bytes. This is done so the data following the structure in
- memory will also align on a proper address boundary.
-
- If the above code is compiled with no (or 1-byte) structure alignment the
- code will operate as expected. With 2-byte alignment an extra two bytes would
- be added to the HEADER structure in memory and make it appear as such:
-
- typedef struct _Header
- {
- BYTE Id;
- BYTE Pad1; // Added padding
- WORD Height;
- WORD Width;
- BYTE Colors;
- BYTE Pad2; // Added padding
- } HEADER;
-
- As you can see the fread() will store the correct value in Id, but the first
- byte of Height will be stored in the padding byte. This will throw off the
- correct storage of data in the remaining part of the structure causing the
- values to be garbage.
-
- A compiler using 4-byte alignment would change the HEADER in memory as such:
-
- typedef struct _Header
- {
- BYTE Id;
- BYTE Pad1; // Added padding
- BYTE Pad2; // Added padding
- BYTE Pad3; // Added padding
- WORD Height;
- WORD Width;
- BYTE Colors;
- BYTE Pad4; // Added padding
- BYTE Pad5; // Added padding
- BYTE Pad6; // Added padding
- } HEADER;
-
- What started off as a 6-byte header increased to 8 and 12 bytes thanks to
- alignment. But what can you do? All the documentation and makefiles you write
- will not prevent someone from compiling with the wrong options flag and then
- pulling their (or your) hair out when your software appears not to work
- correctly.
-
- Now considering this alternative to the ReadHeader() function:
-
- HEADER Header;
-
- void ReadHeader(FILE *fp)
- {
- if (fp != (FILE *)NULL)
- {
- fread(&Header.Id, sizeof(Header.Id), 1, fp);
- fread(&Header.Height, sizeof(Header.Height), 1, fp);
- fread(&Header.Width, sizeof(Header.Width), 1, fp);
- fread(&Header.Colors, sizeof(Header.Colors), 1, fp);
- }
- }
-
- What both you and your compiler now see is a lot more code. Rather than
- reading the entire structure in one, elegant shot, you read in each element
- separately using multiple calls to fread(). The trade-off here is increased
- code size for not caring what the structure alignment option of the compiler
- is set to. These cases are also true for writing structures to files using
- fwrite(). Write only the data and not the padding please.
-
- But is there still anything we've yet over looked? Will fread() (fscanf(),
- fgetc(), and so forth) always return the data we expect? Will fwrite()
- (fprintf(), fputc(), and so forth) ever write data that we don't want, or in
- a way we don't expect? Read on to the next section...
-
- ------------------------------
-
- Subject: 1. What's this business about endianness?
-
- So you've been pulling you hair out trying to discover why your elegant and
- perfect-beyond-reproach code, running on your Macintosh or Sun, is reading
- garbage from PCX and TGA files. Or perhaps your MS-DOS or Windows
- application just can't seem to make heads or tails out of that Sun Raster
- file. And, to make matters even more mysterious, it seems your most
- illustrious creation will read some TIFF files, but not others.
-
- As was hinted at in the previous section, just reading the header of a
- graphics file one field is not enough to insure data is always read correctly
- (not enough for portable code, anyway). In addition to structure, we must also
- consider the endianness of the file's data, and the endianness of the
- system's architecture our code is running on.
-
- Here's are some baseline rules to follow:
-
- 1) Graphics files typically use a fixed byte-ordering scheme. For example,
- PCX and TGA files are always little-endian; Sun Raster and Macintosh
- PICT are always big-endian.
- 2) Graphics files that may contain data using either byte-ordering scheme
- (for example TIFF) will have an identifier that indicates the
- endianness of the data.
- 3) ASCII-based graphics files (such as DXF and most 3D object files),
- have no endianness and are always read in the same way on any system.
- 4) Most CPUs use a fixed byte-ordering scheme. For example, the 80486
- is little-endian and the 68040 is big-endian.
- 5) You can test for the type of endianness a system using software.
- 6) There are many systems that are neither big- nor little-endian; these
- middle-endian systems will possibly cause such byte-order detection
- tests to return erroneous results.
-
- Now we know that using fread() on a big-endian system to read data from a
- file that was originally written in little-endian order will return incorrect
- data. Actually, the data is correct, but the bytes that make up the data are
- arranged in the wrong order. If we attempt to read the 16-bit value 1234h
- from a little-endian file, it would be stored in memory using the big-endian
- byte-ordering scheme and the value 3412h would result. What we need is a swap
- function to change the resulting position of the bytes:
-
- WORD SwapTwoBytes(WORD w)
- {
- register WORD tmp;
- tmp = (w & 0x00FF);
- tmp = ((w & 0xFF00) >> 0x08) | (tmp << 0x08);
- return(tmp);
- }
-
- Now we can read a two-byte header value and swap the bytes as such:
-
- fread(&Header.Height, sizeof(Header.Height), 1, fp);
- Header.Height = SwapTwoBytes(Header.Height);
-
- But what about four-byte values? The value 12345678h would be stored as
- 78563412h. What we need is a swap function to handle four-byte values:
-
- DWORD SwapFourBytes(DWORD dw)
- {
- register DWORD tmp;
- tmp = (dw & 0x000000FF);
- tmp = ((dw & 0x0000FF00) >> 0x08) | (tmp << 0x08);
- tmp = ((dw & 0x00FF0000) >> 0x10) | (tmp << 0x08);
- tmp = ((dw & 0xFF000000) >> 0x18) | (tmp << 0x08);
- return(tmp);
- }
-
- But how do we know when to swap and when not to swap? We always know the
- byte-order of a graphics file that we are reading, but how do we check what
- the endianness of system we are running on is? Using the C language, we might
- use preprocessor switches to cause a conditional compile based on a system
- definition flag:
-
- #define MSDOS 1
- #define WINDOWS 2
- #define MACINTOSH 3
- #define AMIGA 4
- #define SUNUNIX 5
-
- #define SYSTEM MSDOS
-
- #if defined(SYSTEM == MSDOS)
- // Little-endian code here
- #elif defined(SYSTEM == WINDOWS)
- // Little-endian code here
- #elif defined(SYSTEM == MACINTOSH)
- // Big-endian code here
- #elif defined(SYSTEM == AMIGA)
- // Big-endian code here
- #elif defined(SYSTEM == SUNUNIX)
- // Big-endian code here
- #else
- #error Unknown SYSTEM definition
- #endif
-
- My reaction to the above code was *YUCK!* (and I hope yours was too!). A
- snarl of fread(), fwrite(), SwapTwoBytes(), and SwapFourBytes() functions
- laced between preprocessor statements is hardly elegant code, although
- sometimes it is our best choice. Fortunately, this is not one of those times.
-
- What we first need is a set of functions to read the data from a file using
- the byte-ordering scheme of the data. This effectively combines the read\write
- and swap operations into one set of functions. Considering the following:
-
- WORD GetBigWord(FILE *fp)
- {
- register WORD w;
- w = (WORD) (fgetc(fp) & 0xFF);
- w = ((WORD) (fgetc(fp) & 0xFF)) | (w << 0x08);
- return(w);
- }
-
- WORD GetLittleWord(FILE *fp)
- {
- register WORD w;
- w = (WORD) (fgetc(fp) & 0xFF);
- w = ((WORD) (fgetc(fp) & 0xFF) << 0x08);
- return(w);
- }
-
- DWORD GetBigDoubleWord(FILE *fp)
- {
- register WORD dw;
- dw = (DWORD) (fgetc(fp) & 0xFF);
- dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
- dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
- dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
- return(dw);
- }
-
- DWORD GetLittleDoubleWord(FILE *fp)
- {
- register WORD dw;
- dw = (DWORD) (fgetc(fp) & 0xFF);
- dw = ((DWORD) (fgetc(fp) & 0xFF) << 0x08);
- dw = ((DWORD) (fgetc(fp) & 0xFF) << 0x10);
- dw = ((DWORD) (fgetc(fp) & 0xFF) << 0x18);
- return(dw);
- }
-
- void PutBigWord(WORD w, FILE *fp)
- {
- fputc((w >> 0x08) & 0xFF, fp);
- fputc(w & 0xFF, fp);
- }
-
- void PutLittleWord(WORD w, FILE *fp)
- {
- fputc(w & 0xFF, fp);
- fputc((w >> 0x08) & 0xFF, fp);
- }
-
- void PutBigDoubleWord(DWORD dw, FILE *fp)
- {
- fputc((dw >> 0x08) & 0xFF, fp);
- fputc((dw >> 0x10) & 0xFF, fp);
- fputc((dw >> 0x18) & 0xFF, fp);
- fputc(dw & 0xFF, fp);
- }
-
- void PutLittleDoubleWord(DWORD dw, FILE *fp)
- {
- fputc(w & 0xFF, fp);
- fputc((w >> 0x08) & 0xFF, fp);
- fputc((w >> 0x10) & 0xFF, fp);
- fputc((w >> 0x18) & 0xFF, fp);
- }
-
- If we were reading a little-endian file on a big-endian system (or visa
- versa), the previous code:
-
- fread(&Header.Height, sizeof(Header.Height), 1, fp);
- Header.Height = SwapTwoBytes(Header.Height);
-
- Would be replaced by:
-
- Header.Height = GetLittleWord(fp);
-
- The code to write the same value to a file would be changed from:
-
- Header.Height = SwapTwoBytes(Header.Height);
- fwrite(&Header.Height, sizeof(Header.Height), 1, fp);
-
- To the slightly more readable:
-
- PutLittleWord(Header.Height, fp);
-
- Note that these functions are the same regardless of the endianness of a
- system. For example, the ReadLittleWord() will always read a two-byte value
- from a little-endian file regardless of the endianness of the system;
- PutBigDoubleWord() will always write a four-byte big-endian value, and so
- forth.
-
- ------------------------------
-
- Subject: 2. How can I determine the byte-order of a system at run-time?
-
- You may wish to optimize how you read (or write) data from a graphics file
- based on the endianness of your system. Using the GetBigDoubleWord() function
- mentioned in the previous section to read big-endian data from a file on a
- big-endian system imposes extra overhead we don't really need (although if
- the actual number of read/write operations in your program is small you might
- not consider this overhead to be too bad).
-
- If our code could tell what the endianness of the system was at run-time, it
- could choose (using function pointers) what set of read/write functions to
- use. Look at the following function:
-
- #define BIG_ENDIAN 0
- #define LITTLE_ENDIAN 1
-
- int TestByteOrder(void)
- {
- short int w = 0x0001;
- char *byte = (char *) &word;
- return(byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
- }
-
- This code assigns the value 0001h to a 16-bit integer. A char pointer is then
- assigned to point at the first (least-significant) byte of the integer value.
- If the first byte of the integer is 01h, then the system is little-endian
- (the 01h is in the lowest, or least-significant, address). If it is 00h then
- the system is big-endian.
-
- ------------------------------
-
- Subject: 3. How can I identify the format of a graphics file?
-
- When writing any type of file or data stream reader it is very important to
- implement some sort of method for verifying that the input data is in the
- format you expect. Here are a few methods:
-
- 1) Trust the user of your program to always supply the correct data, thereby
- freeing you from the tedious task of writing any type of format
- identification routines. Choose this method and you will provide solid proof
- that contradicts the popular claim that users are inherently far more stupid
- than programmers.
-
- 2) Read the file extension or descriptor. A GIF file will always have the
- extension .GIF, right? Targa files .TGA, yes? And TIFF files will have
- an extension of .TIF or a descriptor of TIFF. So no problem?
-
- Well, for the most part, this is true. This method certainly isn't
- bulletproof, however. Your reader will occasionally be fed the odd-batch of
- mis-label files ("I thought they were PCX files!"). Or files with
- unrecognized mangled extensions (.TAR rather than .TGA or .JFI rather than
- .JPG) that your reader knows how to read, but won't read because it doesn't
- recognize the extensions. File extensions also won't usually tell you the
- revision of the file format you are reading (with some revisions creating an
- almost entirely new format). And more than one file format share the more
- common file extensions (such as .IMG and .PIC). And last of all, data streams
- have no file extensions or descriptors to read at all.
-
- 3) Read the file and attempt to recognize the format by specific patterns in
- the data. Most file formats contain some sort of identifying pattern of data
- that is identical in all files. In some cases this pattern gives and
- indication of the revision of the format (such as GIF87a and GIF89a) or
- the endianness of the data format.
-
- Nothing is easy, however. Not all formats contain such identifiers (such as
- PCX). And those that do don't necessarily put it at the beginning of the
- file. This means if the data is in the format of a stream you many have to
- read (and buffer) most or all of the data before you can determine the
- format. Of course, not all graphics formats are suitable to be read as a data
- stream anyway.
-
- Your best bet for a method of format detection is a combination of methods
- two and three. First believe the file extension or descriptor, read some
- data, and check for identifying data patterns. If this test fails, then
- attempt to recognize all other known patterns.
-
- Run-time file format identification a black-art at best.
-
- ------------------------------
-
- Subject: III. Kudos and Assertions
-
- ------------------------------
-
- Subject: 0. Acknowledgments
-
- Nobody yet.
-
- Doesn't anybody have any neat tricks to share?
-
- ------------------------------
-
- Subject: 1. About The Author
-
- The author of this FAQ, James D. Murray, lives in the City of Orange, Orange
- County, California, USA. He is the co-author of the book Encyclopedia of
- Graphics File Formats published by O'Reilly and Associates, makes a passable
- living writing Microsoft Windows applications in C++, and may be reached as
- jdm@netcom.com, or via U.S. Snail at: P.O. Box 70, Orange, CA 92666-0070 USA.
-
- GCS d-- H++ s g- p? au+ a w+ v++ C+++(++++) US+++ p++>++++ L>++ 3 E--- N++ K-
- W---$ M-@ V-- po Y+ t++ 5-- j>x R+>-- G' tv-->! b+++ D++ B e- u* h- f r-->+++
- n++ y*(**)
-
- ------------------------------
-
- Subject: 2. Disclaimer
-
- While every effort has been taken to insure the accuracy of the information
- contained in this FAQ list compilation, the author and contributors assume no
- responsibility for errors or omissions, or for damages resulting from the use
- of the information contained herein.
-
- ------------------------------
-
- Subject: 3. Copyright Notice
-
- This FAQ is Copyright (C) 1994-95 by James D. Murray. All Rights Reserved.
- This work may be reproduced, in whole or in part, using any medium,
- including, but not limited to, electronic transmission, CD-ROM, or published
- in print, under the condition that this copyright notice remains intact.
-
-